Information storage and retrieval system and method

ABSTRACT

AN INFORMATION STORAGE AND RETRIEVAL SYSTEM AND METHOD CAPABLE OF HANDLING QUERIES IN SENTENCE FORM AND PRESENTING RESPONSES EITHER AS IDENTIFICATION NUMBERS OF POTENTIALLY PERTINENT DOCUMENTS OR AS DOCUMENTS IN FULL TEXT FORM. STORED DOCUMENTS CONTAINING SO-CALLED SEARCH SYMBOLS ARE KNOWN TO THE SYSTEM. A PROBABILITY CALCULATOR DETERMINES THE LIKELIHOOD OF THE OCCURRENCE OF A SEARCH SYMBOL AT LEAST ONCE IN A GIVEN STORED DOCUMENT IN THE SYSTEM&#39;&#39;S DATA BASE. THE SYSTEM AND METHOD CAUSES THE QUERY TEXT TO BE SCANNED SO AS TO DETERMINE WHICH SEARCH SYMBOLS ARE CONTAINED THEREIN. THE TERM OVERLAP IS USED TO DESIGNATE A SEARCH SYMBOL IN THE QUERY AND ALSO IN A GIVEN STORED DOCUMENTS. THE PARTICULAR DOCUMENT IN THE SYSTEM&#39;&#39;S DATA BASE HAVING THE SMALLEST JOINT PROBABILITY OF OCCURRENCE OF OVERLAP SEARCH SYMBOLS IN DESIGNATED AS HAVING THE HIGHEST RELEVANCE POTENTIAL WITHIN THE DATA BASE TO A GIVEN QUERY. THE STORED DOCUMENT HAVING THE NEXT LARGER JOINT PROBABILITY OF OCCURRENCE OF OVERLAP SEARCH SYMBOLS, HAS THE NEXT HIGHEST RELEVANCE POTENTIAL. IN THIS MANNER, ANY SELECTED NUMBER OF RELEVANT STORED DOCUMENTS MAY BE OUTPUTTED BY THE SYSTEM AND METHOD, IN THE ORDER OF RELEVANCE, AS EITHER IDENTIFICATION NUM-   BERS OF POTENTIALLY PERTINENT DOCUMENTS OR AS THE DOCUMENTS PER SE IN FULL FORM.

DEFENSIVE PUBLICATION UNITED STATES PATENT OFFICE Published at the request of the applicant or owner in accordance with the Notice of Dec. 16, 1969, 869 0.G. 687. The abstracts of Defensive Publication applications are identified by distinctly numbered series and are arranged chronologically. The heading of each abstract indicates the number of pages of specification, including claims and sheets of drawings contained may be purchased for 30 cents a sheet.

in the application as originally filed. The files of these applications are available to the public for inspection and reproduction Defensive Publication applications have not been examined as to the merits of alleged invention. The Patent Ofllce makes no assertion as to the novelty of the disclosed subject matter.

PUBLISHED JULY 18, 1972 T900,006 INFORMATION STORAGE AND RETRIEVAL SYSTEM AND METHOD Matthews P. Perriens, Rockville, Md., and John H.

Williams, Jr., Annandale, Va., assignors to Intemational Business Machines Corporation, Armonk, N. Continuation of application Ser. No. 736,837, June 13, 1968. This application Apr. 19, 1971, Ser. No. 135,467 Int. Cl. G06f 1/00, 7/00, 15/00 US. Cl. 340-1725 3 Sheets Drawing. 21 Pages Specification 1o I 2o 24 consonants JOINT INPUT ggfigggigg SUHSET usr pronoun I GENERATOR 'usr GENEMTOR l m l l I F i l l l I W l I l \i F i r PROBABILITY LIST ust CMBULATDR INVERTER REARRANGER PM i l DOCUMENT m SELECTOR An information storage and retrieval system and method capable of handling queries in sentence form and presenting responses either as identification numbers of potentially pertinent documents or as documents in full text form. Stored documents containing socalled search symbols are known to the system. A probability calculator determines the likelihood of the occurrence of a search symbol at least once in a given stored document in the systems data base. The system and method causes the query text to be scanned so as to determine which search symbols are contained therein. The term overlap is used to designate a search symbol in the query and also in a. given stored document. The particular document in the systems data base having the smallest joint probability of occurrence of overlap search symbols is designated as having the highest relevance potential within the data base to a given query. The stored document having the next larger joint probability of occurrence of overlap search symbols, has the next highest relevance potential. In this manner, any select number of relevant stored documents may be outputted by the system and method, in the order of relevance, as either identification numbers of potentially pertinent documents or as the documents per se in full text form.

July 18, 1972 Original Filed June 13, 1968 INFORMATION STORAGE AND RETRIEVAL SYSTEM AND METHOD 3 Sheets-Sheet '1 CONCORDANCE JOINT INPUT SUBSET LIST PROBABILITY GENERATOR LIST GENERATOR N r N 1 1 3| S2 S3 34 S5 S6 S7 S8 59 f v N R PROBABILITY LIST LIST CALCULATOR INVERTER REARRANGER c I I81 1 30) V I DOCUMENT SCANNER SELECTOR FIG I OUTPUT INVENTORS NmNEw P. RERRNENs JOHN R.NNL1NNs,NR.

ATTORNEY y 1972 M. P. PERRIENS A T900,006

INFORMATION STORAGE AND RETRIEVAL SYSTEM AND METHOD Original Filed June 15, 1968 s Sheets-Sheet" z INPUT I OUTPUT 42 T T G 2 INPUT OUTPUT CHANNEL 46 T N 48L 1 CORE STORAGE 4 7 ARITHMETIC CONTROL UNIT UNIT UNIT CONCORDANCE SEARCH SYMBOLS DOCUMENT NUMBERS SS| OOC| DOC DOC23 DOC F 3 s3 D003 DOC23 000 00c ss D002 D0C5 s s D007 000, 000, ooc DOC DOCUMENTS ss, 00c 00c U00 000 g 53 000 D006 F 4 5 S83 Uoc 000 00c 2 s3 D002 $3 U00 U00 s5 D0C4 U00 LIST I 

